COCKTAIL: Multi-Core Co-Optimization Framework With Proactive Reliability Management

نویسندگان

چکیده

High performance computing (HPC) servers aim to meet an increase in the number and complexity of tasks and, consequently, address energy efficiency challenge. In addition efficiency, it is essential manage lifetime limitations power-hungry components (e.g., cores cache), hence avoiding server failure before its period. Traditional approaches focus on either using hybrid caches reduce leakage power traditional static random-access memory (SRAM) cache, thus or trade-off between multi-core processors. However, these fall short terms flexibility applicability for HPC multi-parametric optimization including quality-of-service (QoS), reliability, efficiency. As a result, this paper we propose COCKTAIL, holistic strategy framework jointly optimize processors context, while guaranteeing reliability. First, analyze best cache technology among SRAM resistive random access (RRAM), within context architectures, improve endurance limits with respect requirements. Second, introduce novel efficient proactive queue policy reorder execution considering their end time possible reliability effects use caches. Third, present dynamic model predictive control (MPC)-based management method maximize task performance, by controlling frequency, temperature, target processor. Our results demonstrate that, consuming similar energy, COCKTAIL provides up 60% QoS improvement when compared latest state-of-the-art techniques context. Moreover, our guarantees design longer than 5 years whole system.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Satellite Conceptual Design Multi-Objective Optimization Using Co Framework

This paper focuses upon the development of an efficient method for conceptual design optimization of a satellite. There are many option for a satellite subsystems that could be choice, as acceptable solution to implement of a space system mission. Every option should be assessment based on the different criteria such as cost, mass, reliability and technology contraint (complexity). In this rese...

متن کامل

satellite conceptual design multi-objective optimization using co framework

this paper focuses upon the development of an efficient method for conceptual design optimization of a satellite. there are many option for a satellite subsystems that could be choice, as acceptable solution to implement of a space system mission. every option should be assessment based on the different criteria such as cost, mass, reliability and technology contraint (complexity). in this rese...

متن کامل

Reliability-Aware Power Management Of Multi-Core Systems (MPSoCs)

Long-term reliability of processors in embedded systems is experiencing growing attention since decreasing feature sizes and increasing power consumption have a negative influence on the lifespan. Among other measures, the reliability can be influenced significantly by Dynamic Power Management (DPM), since it affects the processor’s temperature. Compared to single-core systems reconfigurable mu...

متن کامل

Reliability-Aware Power Management of Multi-Core Processors

Long-term reliability of processors is experiencing growing attention since decreasing feature sizes and increasing power consumption have a negative influence on the lifespan. The reliability can also be influenced by Dynamic Power Management (DPM), since it affects the processor’s temperature. In this paper, it is examined how different DPM-strategies for MultiCore processors alter their life...

متن کامل

A Proactive Management Framework in Active Clusters

An active Web cluster system is an active network that has a collection of locally distributed servers that are interconnected by active switches, providing a Web application service. In this paper, we introduce the ALBM (Adaptive Load Balancing and Management) active cluster system that provides proactive management. The architecture of the ALBM active cluster and its underlying components are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

سال: 2021

ISSN: ['1937-4151', '0278-0070']

DOI: https://doi.org/10.1109/tcad.2021.3058959